EFFICIENCY-CONSCIOUS PROPOSITIONALIZATIONFOR RELATIONAL LEARNING Part Two: Boosting Efficiency

نویسندگان

  • Filip Železný
  • Olga Štěpánková
چکیده

Systems aiming at discovering interesting knowledge in data, now commonly called data mining systems, are typically employed in finding patterns in a single relational table. Most of mainstream data mining tools are not applicable in the more challenging task of finding knowledge in structured data represented by a multi-relational database. Although a family of methods known as inductive logic programming have been developed to tackle that challenge by immediate means, the idea of adapting structured data into a simpler form digestible by the wealth of AVL systems has been always tempting to data miners. To this end, we present a method based on constructing first-order logic features that conducts this kind of conversion, also known as propositionalization. It incorporates some basic principles suggested in previous research and provides significant enhancements that lead to remarkable improvements in efficiency of the feature-construction process. In the first part, we motivated the propositionalization task with an illustrative example, reviewed some previous approaches to propositionalization, and formalizeed the concept of a first-order feature elaborating mainly the points that influence the efficiency of the designed feature-construction algorithm. In this second part, we describe several pruning mechanisms further improving the efficiency of the algorithm, present its implementation as well as an empirical assessment on one artificial and one real-world dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scaling Boosting by Margin-Based Inclusionof Features and Relations

Boosting is well known to increase the accuracy of propositional and multi-relational classification learners. However, the base learner’s efficiency vitally determines boosting’s efficiency since the complexity of the underlying learner is amplified by iterated calls of the learner in the boosting framework. The idea of restricting the learner to smaller feature subsets in order to increase ef...

متن کامل

Active relational rule learning in a constrained confidence rated boosting framework

In this dissertation, I investigate the potential of boosting within the framework of relational rule learning. Boosting is a particularly robust and powerful technique to enhance the prediction accuracy of systems that learn from examples. Although boosting has been extensively studied in the last years for propositional learning systems, only little attention has been paid to boosting in rela...

متن کامل

Trading Expressivity for Efficiency in Statistical Relational Learning

Statistical relational learning (SRL) combines state-of-the-art statistical modeling with relational representations. It thereby promises to provide effective machine learning techniques for domains that cannot adequately be described using a propositional representation. Driven by new applications in which data is structured, interrelated, and heterogeneous, this area of machine learning has r...

متن کامل

Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values

Gradient boosting of regression trees is a competitive procedure for learning predictive models of continuous data that fits the data with an additive non-parametric model. The classic version of gradient boosting assumes that the data is independent and identically distributed. However, relational data with interdependent, linked instances is now common and the dependencies in such data can be...

متن کامل

Discriminative Learning for Label Sequences via Boosting

This paper investigates a boosting approach to discriminative learning of label sequences based on a sequence rank loss function. The proposed method combines many of the advantages of boosting schemes with the efficiency of dynamic programming methods and is attractive both, conceptually and computationally. In addition, we also discuss alternative approaches based on the Hamming loss for labe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004